Aralex: a lexical database for Modern Standard Arabic.

نویسندگان

  • Sami Boudelaa
  • William D Marslen-Wilson
چکیده

In this article, we present a new lexical database for Modern Standard Arabic: Aralex. Based on a contemporary text corpus of 40 million words, Aralex provides information about (1) the token frequencies of roots and word patterns, (2) the type frequency, or family size, of roots and word patterns, and (3) the frequency of bigrams, trigrams in orthographic forms, roots, and word patterns. Aralex will be a useful tool for studying the cognitive processing of Arabic through the selection of stimuli on the basis of precise frequency counts. Researchers can use it as a source of information on natural language processing, and it may serve an educational purpose by providing basic vocabulary lists. Aralex is distributed under a GNU-like license, allowing people to interrogate it freely online or to download it from www.mrc-cbu.cam.ac.uk:8081/aralex.online/login.jsp.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Lexical Database for Modern Standard Arabic Interoperable with a Finite State Morphological Transducer

Current Arabic lexicons, whether computational or otherwise, make no distinction between entries from Modern Standard Arabic (MSA) and Classical Arabic (CA), and tend to include obsolete words that are not attested in current usage. We address this problem by building a large-scale, corpus-based lexical database that is representative of MSA. We use an MSA corpus of 1,089,111,204 words, a pre-a...

متن کامل

An Open-Source Finite State Morphological Transducer for Modern Standard Arabic

We develop an open-source large-scale finitestate morphological processing toolkit (AraComLex) for Modern Standard Arabic (MSA) distributed under the GPLv3 license.1 The morphological transducer is based on a lexical database specifically constructed for this purpose. In contrast to previous resources, the database is tuned to MSA, eliminating lexical entries no longer attested in contemporary ...

متن کامل

Aralex: A lexical database for Modern Standard Arabic

Psycholinguistic databases, providing statistical information such as word frequency, length, and imageability, have proved to be invaluable tools for the experimental investigation of the cognitive processes underlying language functions and for the design of language assessment tools for both educational and clinical purposes (Lété, Sprenger-Charolles, & Colé, 2004; Stadthagen-Gonzalez & Davi...

متن کامل

Developing LMF-XML Bilingual Dictionaries for Colloquial Arabic Dialects

The Linguistic Data Consortium and Georgetown University Press are collaborating to create updated editions of bilingual dictionaries that had originally been published in the 1960’s for English-speaking learners of Moroccan, Syrian and Iraqi Arabic. In their first editions, these dictionaries used ad hoc Latin-alphabet orthography for each colloquial Arabic dialect, but adopted some properties...

متن کامل

The Architecture Of A Standard Arabic Lexical Database: Some Figures, Ratios And Categories From The DIINAR.1 Source Program

This paper is a contribution to the issue – which has, in the course of the last decade, become critical – of the basic requirements and validation criteria for lexical language resources in Standard Arabic. The work is based on a critical analysis of the architecture of the DIINAR.1 lexical database, the entries of which are associated with grammar-lexis relations operating at word-form level ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Behavior research methods

دوره 42 2  شماره 

صفحات  -

تاریخ انتشار 2010